DropDim: A Regularization Method for Transformer Networks

نویسندگان

چکیده

We introduceDropDim, a structured dropout method designed for regularizing the self-attention mechanism, which is key component of transformer. In contrast to general method, randomly drops neurons, DropDim part embedding dimensions. this way, semantic information can be completely discarded. Thus, excessive coadapting between different dimensions broken, and forced encode meaningful featureswith certain number erased. Experiments on wide range tasks executed MUST-C English-Germany dataset show that effectively improve model performance, reduce over-fitting, complementary effects with other regularization methods. When combined label smoothing, WER reduced from 19.1% 15.1% ASR task, BLEU value increased from26.90 28.38 MT task. On ST reach score 22.99, an increase by 1.86 points compared strong baseline.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A regularization method for solving a nonlinear backward inverse heat conduction problem using discrete mollification method

The present essay scrutinizes the application of discrete mollification as a filtering procedure to solve a nonlinear backward inverse heat conduction problem in one dimensional space. These problems are seriously ill-posed. So, we combine discrete mollification and space marching method to address the ill-posedness of the proposed problem. Moreover, a proof of stability and<b...

متن کامل

Applying Legendre Wavelet Method with Regularization for a Class of Singular Boundary Value Problems

In this paper Legendre wavelet bases have been used for finding approximate solutions to singular boundary value problems arising in physiology. When the number of basis functions are increased the algebraic system of equations would be ill-conditioned (because of the singularity), to overcome this for large $M$, we use some kind of Tikhonov regularization. Examples from applied sciences are pr...

متن کامل

A Novel Method for Designing and Optimization of Networks

In this paper, system planning network is formulated with mixed-integer programming. Two meta-heuristic procedures are considered for this problem. The cost function of this problem consists of the capital investment cost in discrete form, the cost of transmission losses and the power generation costs. The DC load flow equations for the network are embedded in the constraints of the mathematica...

متن کامل

Introducing a New Method for Multiarea Transmission Networks Loss Allocation

Transmission loss allocation in very large networks with multiple interconnected areas or countries is investigated in this paper. The main contribution is to propose a method to calculate the amount of losses due to activity of each participant in the multi area markets. Pricing of cross-border trades in Multi area systems is often difficult since individual countries may use incompatible ...

متن کامل

Regularization for Neural Networks

Research into regularization techniques is motivated by the tendency of neural networks to to learn the specifics of the dataset it was trained on rather than learning general features that are applicable to unseen data. This is known as overfitting. The goal of any supervised machine learning task is to approximate a function that maps inputs to outputs, given a dataset of examples and labels....

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IEEE Signal Processing Letters

سال: 2022

ISSN: ['1558-2361', '1070-9908']

DOI: https://doi.org/10.1109/lsp.2022.3140693